Goto

Collaborating Authors

 rgb channel


Enhancing Feature Diversity Boosts Channel-Adaptive Vision Transformers

Neural Information Processing Systems

Multi-Channel Imaging (MCI) contains an array of challenges for encoding useful feature representations not present in traditional images. For example, images from two different satellites may both contain RGB channels, but the remaining channels can be different for each imaging source. Thus, MCI models must support a variety of channel configurations at test time. Recent work has extended traditional visual encoders for MCI, such as Vision Transformers (ViT), by supplementing pixel information with an encoding representing the channel configuration. However, these methods treat each channel equally, i.e., they do not consider the unique properties of each channel type, which can result in needless and potentially harmful redundancies in the learned features.


Enhancing Feature Diversity Boosts Channel-Adaptive Vision Transformers

Neural Information Processing Systems

Multi-Channel Imaging (MCI) contains an array of challenges for encoding useful feature representations not present in traditional images. For example, images from two different satellites may both contain RGB channels, but the remaining channels can be different for each imaging source. Thus, MCI models must support a variety of channel configurations at test time. Recent work has extended traditional visual encoders for MCI, such as Vision Transformers (ViT), by supplementing pixel information with an encoding representing the channel configuration. However, these methods treat each channel equally, i.e., they do not consider the unique properties of each channel type, which can result in needless and potentially harmful redundancies in the learned features.


Feature-Augmented Deep Networks for Multiscale Building Segmentation in High-Resolution UAV and Satellite Imagery

Maniyar, Chintan B., Kumar, Minakshi, Mai, Gengchen

arXiv.org Artificial Intelligence

Accurate building segmentation from high-resolution RGB imagery remains challenging due to spectral similarity with non-building features, shadows, and irregular building geometries. In this study, we present a comprehensive deep learning framework for multiscale building segmentation using RGB aerial and satellite imagery with spatial resolutions ranging from 0.4m to 2.7m. We curate a diverse, multi-sensor dataset and introduce feature-augmented inputs by deriving secondary representations including Principal Component Analysis (PCA), Visible Difference Vegetation Index (VDVI), Morphological Building Index (MBI), and Sobel edge filters from RGB channels. These features guide a Res-U-Net architecture in learning complex spatial patterns more effectively. We also propose training policies incorporating layer freezing, cyclical learning rates, and SuperConvergence to reduce training time and resource usage. Evaluated on a held-out WorldView-3 image, our model achieves an overall accuracy of 96.5%, an F1-score of 0.86, and an Intersection over Union (IoU) of 0.80, outperforming existing RGB-based benchmarks. This study demonstrates the effectiveness of combining multi-resolution imagery, feature augmentation, and optimized training strategies for robust building segmentation in remote sensing applications.


Evaluating the Impact of Point Cloud Colorization on Semantic Segmentation Accuracy

Zhu, Qinfeng, Cao, Jiaze, Cai, Yuanzhi, Fan, Lei

arXiv.org Artificial Intelligence

Point cloud semantic segmentation, the process of classifying each point into predefined categories, is essential for 3D scene understanding. While image-based segmentation is widely adopted due to its maturity, methods relying solely on RGB information often suffer from degraded performance due to color inaccuracies. Recent advancements have incorporated additional features such as intensity and geometric information, yet RGB channels continue to negatively impact segmentation accuracy when errors in colorization occur. Despite this, previous studies have not rigorously quantified the effects of erroneous colorization on segmentation performance. In this paper, we propose a novel statistical approach to evaluate the impact of inaccurate RGB information on image-based point cloud segmentation. We categorize RGB inaccuracies into two types: incorrect color information and similar color information. Our results demonstrate that both types of color inaccuracies significantly degrade segmentation accuracy, with similar color errors particularly affecting the extraction of geometric features. These findings highlight the critical need to reassess the role of RGB information in point cloud segmentation and its implications for future algorithm design.


Model Monitoring and Robustness of In-Use Machine Learning Models: Quantifying Data Distribution Shifts Using Population Stability Index

Khademi, Aria, Hopka, Michael, Upadhyay, Devesh

arXiv.org Artificial Intelligence

Safety goes first. Meeting and maintaining industry safety standards for robustness of artificial intelligence (AI) and machine learning (ML) models require continuous monitoring for faults and performance drops. Deep learning models are widely used in industrial applications, e.g., computer vision, but the susceptibility of their performance to environment changes (e.g., noise) \emph{after deployment} on the product, are now well-known. A major challenge is detecting data distribution shifts that happen, comparing the following: {\bf (i)} development stage of AI and ML models, i.e., train/validation/test, to {\bf (ii)} deployment stage on the product (i.e., even after `testing') in the environment. We focus on a computer vision example related to autonomous driving and aim at detecting shifts that occur as a result of adding noise to images. We use the population stability index (PSI) as a measure of presence and intensity of shift and present results of our empirical experiments showing a promising potential for the PSI. We further discuss multiple aspects of model monitoring and robustness that need to be analyzed \emph{simultaneously} to achieve robustness for industry safety standards. We propose the need for and the research direction toward \emph{categorizations} of problem classes and examples where monitoring for robustness is required and present challenges and pointers for future work from a \emph{practical} perspective.


VisualWordGrid: Information Extraction From Scanned Documents Using A Multimodal Approach

Kerroumi, Mohamed, Sayem, Othmane, Shabou, Aymen

arXiv.org Artificial Intelligence

We introduce a novel approach for scanned document representation to perform field extraction. It allows the simultaneous encoding of the textual, visual and layout information in a 3D matrix used as an input to a segmentation model. We improve the recent Chargrid and Wordgrid models in several ways, first by taking into account the visual modality, then by boosting its robustness in regards to small datasets while keeping the inference time low. Our approach is tested on public and private document-image datasets, showing higher performances compared to the recent state-of-the-art methods.


Deep learning on large images: challenges and CNN

#artificialintelligence

Applying deep learning on large images is always a challenge but there a solution using convolutional but first, let's understand in brief where is the challenge? So one of the challenges in computer vision is that inputs become very big as we increase image size. Consider the example of a basic image classification problem e.g.; cat detection. Let's take an input 64 64 image of a cat and try to figure out if that is a cat or not? to do that we'll need 64x64x3, (where 3 is the no of RGB channel) parameters, so the x input feature will have 12288 dimensions though this is not a very large number considering just the 64 64 image size which is very small there are lots of input features to deal with.. This is just for a 1MB image but in computer vision problem you don't want to stick with using just tiny images using bigger images results in overfitting and huge input feature vector so here comes convolution operation which is the basic building block of Convolutional Neural Network (CNN).